Keyword [Behavior]

Ehsani K, Bagherinezhad H, Redmon J, et al. Who Let The Dogs Out? Modeling Dog Behavior From Visual Data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4051-4060.

1. Overview

1.1. Motivation

most cv task related to visual intelligence

In this paper, it directly model a visullay intelligent agent

input visual information, predict actions of the agent
DECADE dataset. ego-centric videos from a dog’s perspective

how the dog acts
how the dog plans
learn from a dog
the task of walkable surface estimation and scen classification by using this dog modeling task as representation learning.

1.2. Definition of the Problems

understanding visual data to the extent that an agent can take actions and perform tasks in the visual world

1.3. Dataset

mount Intertial Measurement Units (IMU) on the joints and body of the dog. record the absolute position and calculate the relative angle of the dog’s main limbs and body (angular displacement represented as a 4 dimensional quaternion vector)
mount a camera on dog’s head. (380 video clips; 24500 frames, 21000 for training, 1500 for validation and 2000 for testing; various indoor and out door scenes, more than 50 different location)

the differences of the angular displacements between two consecutive frames represents the action of the dog in that timestep
connect all IMU to the same embedded system (Raspberry pi 3.0)
the rate of the joint movement readings and video frames are different. perform interpolation and averaging to compute the absolute angular orientation for each frame
use K-means clustering to quantize the action space. formulate the problems as classification rather than regression

Visual Prediction. (activity forecast, people intent)
Sequence to Sequence Models
Ego-centric Vision
Ego-motion estimation
Action Inference & Planning
Inverse Reinforcement Learning
Self-supervision

1.5. Act like a Dong

input. a series of frame (1~t)
output. a series action (t+1~N)
ResNet’s weights are shared.

1.6. Plan like a Dog

Input. two frames (1, N)
Output. a series action (2, N-1)

1.7. Learn from a Dog

Compare the pre-trained ResNet-18 (input two frames [t, t+1], predict the action between [t, t+1]) on DECADE and ImageNet.

1.8. Future Work

variety input. touch, smell
collect data from multiple dogs. evaluate generation across dogs

1.9. Experiments

1.9.1. Metric

class accuracy
perplexity

(CVPR 2018) Who Let The Dogs Out Modeling Dog Behavior From Visual Data

1. Overview

1.1. Motivation

1.2. Definition of the Problems

1.3. Dataset

1.5. Act like a Dong

1.6. Plan like a Dog

1.7. Learn from a Dog

1.8. Future Work

1.9. Experiments

1.9.1. Metric

1.9.2. Learning to Act

1.9.3. Learning to Plan

1.9.4. Learning from Dog

1. Overview

1.1. Motivation

1.2. Definition of the Problems

1.3. Dataset

1.4. Related Work

1.5. Act like a Dong

1.6. Plan like a Dog

1.7. Learn from a Dog

1.8. Future Work

1.9. Experiments

1.9.1. Metric

1.9.2. Learning to Act

1.9.3. Learning to Plan

1.9.4. Learning from Dog